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Abstract 

Let Xi, ...,X,n be a set of rn statistically dependent sources over the common alphabet F^, that are linearly 
independent when considered as functions over the sample space. We consider a distributed function computation 
■ setting in which the receiver is interested in the lossless computation of the elements of an s-dimensional subspace 

W spanned by the elements of the row vector [Xi, . . . , X,„]r in which the (to x s) matrix F has rank s. A sequence 
of three increasingly refined approaches is presented, all based on linear encoders. 

The first approach uses a common matrix to encode all the sources and a Komer-Marton like receiver to directly 
compute W. The second improves upon the first by showing that it is often more efficient to compute a carefully 
I chosen superspace U of W. The superspace is identified by showing that the joint distribution of the {Xi} induces 

a unique decomposition of the set of all linear combinations of the {Xi}, into a chain of subspaces identified by a 
normaUzed measure of entropy. This subspace chain also suggests a third approach, one that employs nested codes. 
For any joint distribution of the {Xi} and any W, the sum-rate of the nested code approach is no larger than that 
under the Slepian-Wolf (SW) approach. Under the SW approach, W is computed by first recovering each of the 
{Xi}. For a large class of joint distributions and subspaces W, the nested code approach is shown to improve upon 
SW. Additionally, a class of source distributions and subspaces are identified, for which the nested-code approach 
Q . is sum-rate optimal. 

I. Introduction 

In 121, Komer and Marton consider a distributed source coding problem with two discrete memory less binary 
^ I sources Xi and X2 and a receiver interested in recovering their modulo-two sum Z = X1+X2 mod 2. An obvious 
approach to this problem would be to first recover both Xi and X2 using a Slepian-Wolf encoder ||3] and then 
compute their modulo-two sum thus yielding a sum-rate of H{Xi, X2). Korner and Marton present an interesting, 
O ' alternative approach in which they first select a (A; x n) binary matrix A that is capable of efficiently compressing 
^ I Z = Xi + X2 mod 2, where Xi, X2 and Z correspond to i.i.d. n-tuple realizations of Xi, X2 and Z respectively. 
. . ■ The two sources then transmit ^Xi and AX.2 respectively. The receiver first computes AX.i + A'X.2 = AZ mod 2 
. and then recovers Z from AZ. Since optimal linear compression of a finite field discrete memoryless source is 
^ ' possible im, the compression rate ^ associated with A can be chosen to be as close as desired to H{Z), thereby 
implying the achievability of the sum rate 2H{Z) for this problem. For a class of symmetric distributions, it is 
shown that this rate not only improves upon the sum rate, H{Xi,X2), incurred under the Slepian-Wolf approach, 
but is also optimum. 

In this paper, we consider a natural generalization of the above problem when there are more than two statistically 
dependent sources and a receiver interested in recovering multiple linear combinations of the sources. Our interest 
is in finding achievable sum rates for the problem and we restrict ourselves to linear encoding in all our schemes. 
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A. System Model 

Consider a distributed source coding problem involving m sources Xi, Xm. and a receiver that is interested in 
the lossless computation (i.e., computation with arbitrarily small probability of error) of a function of these sources. 
All sources are assumed to take values from a common alphabet, the finite field ¥g of size q. The sources are 
assumed to be memoryless and possessing a time-invariant joint distribution given by Pxi...x^- We will assume 
this joint distribution to be "linearly non-degenerate", by which we mean that when the random variables 

{Xi, I < i < m} are regarded as functions over the sample space Q,, they are linearly independent, i.e., 

m 

''^^aiXi{Lo) = 0, alluo^Q, € Fg, 

i=l 

iff Oj = 0, all i. For simplicity in notation, we will henceforth drop lo in the notation. By identifying the linear 
combination Xll^i '^i^i with the vector [oi 02 • • • Om]^> we see that the vector space V of all possible linear 
combinations of the {Xi} can be identified with F™. 

The function of interest at the receiver is assumed to be the set of s linear combinations {Zi | z = 1, 2, . . . , s} 
of the m sources given by: 

[Zi,...,Zs] = [Xu...,x^]r, (1) 

in which T is an (m x s) matrix over F^ of full rank s and where matrix multiplication is over F^. Note that a receiver 
which can losslessly compute {Zi, i = 1, . . . , s} can also compute any linear combination X]i=i ft-^ii A G IFg, 
of these random variables. The set of all linear combinations of the {Zi, i = l,...,s} forms a subspace W 
of the vector space V, which can be identified with the column space of the matrix F. This explains the phrase 
'computation of subspaces ' appearing in the title. Throughout this paper, we will interchangeably refer to the 
{Xi, i = 1 . . . , m} as random variables (when they refer to sources) and as vectors (when they are considered as 
functions on the sample space). We will also write V =< Xi,. . . ,Xm > to mean that V is generated by the 
vectors {Xi, 1 < i < m}. Similarly with other random variables and their vector interpretations. 







Receiver 


>■ 


A X 






Am 







Zi 



Fig. 1. The common structure for all three approaches to subspace computation. 



Encoder: All the encoders will be linear and will operate on n-length i.i.d. realizations, of the corresponding 
sources Xi, I < i < m. Thus the ith encoder will map the n-length vector Xj to Aj-"^Xj for some (fcj x n) matrix 
Aj-"^ over ¥q. The rate of the encoder, in bits per symbol, is thus given by 

= -logg. (2) 

n 

Receiver. The receiver is presented with the problem of losslessly recovering the {Zj, I < i < s}, which are 
n-length extensions of the random variables {Zi, 1 < i < s} defined in ([Hi, from the {^jXj, 1 < i < m}. Let W 
be the space spanned by the {Zj, I < i < s}. Then lossless recovery of the {Zj} amounts to lossless computation 
of the subspace W. Thus in the present notation, W is to as Zj is to Zi. 

^ (n) 

For 1 < i < s, let Zj denote the receiver's estimates of Zj. We will use Pe to denote the probability of error 
in decoding, i.e.. 

Pi") = P ((Zi . . . Z,) / (Zi . . . Z,)) . (3) 
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Achiev ability: A rate tuple . . . , R^) is said to be achievable, if for any 8 > there exists a sequence of 
matrix encoders | • • • , ^m^^ | (and receivers) such that < i?j + 5, 1 < i < m, for sufficiently large 

n, and lim„_j.oo Pe^^ = 0. A sum rate R will be declared as being achievable, whenever R = Xli^i for some 
achievable rate tuple {Ri, . . . ,Rm). By rate region we will mean the closure of set of achievable rate m-tuples. 
In situations where all encoders employ a common matrix encoder, we will then use the term minimum symmetric 
rate to simply mean the minimum of all values R such that the symmetric point = i?, . . . , Rm = R) lies in 
the rate region. 

B. Our Work 

In this paper, we present three successive approaches to the subspace computation problem along with explicit 
characterization of the corresponding achievable sum rates. As illustrated (Fig. [T}, all three approaches will use 
linear encoders, but will differ in their selection of encoding matrices Ai. We provide an overview here, of the 
three approaches along with a brief description of their achievable sum rates, and some explanation for the relative 
performance of the three approaches. Details and proofs appear in subsequent sections. 

Common Code (CC) approach: Under this approach, the encoding matrices of all the sources are assumed to be 
identical, i.e., Ai = A, 1 < i < m. It is also assumed, that the receiver decodes [Zi . . . Z^] by first computing 

(Fig. m 
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and thereafter processing the {AZi}. Thus the CC approach could be regarded as the analogue of the Korner- 
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Fig. 2. The Common Code approach. 



Marton approach for the modulo-two sum of two sources described earlier. The minimum symmetric rate under 
this approach is characterized in Theorem [T] 

Selected Subspace Approach: It turns out interestingly, that compression rates can often be improved by using the 
CC approach to compute a larger subspace U QV that contains the desired subspace W , i.e., W QU. We will refer 
to this variation of the common code approach under which we compute the superspace U of W , as the Selected 
Subspace (SS) approach. Thus when we speak of the SS approach, we will mean the selected-subspace variation 
of the common-code approach. We will present in the sequel (Theorem |5]l, an analytical means of determining for 
a given subspace W, the best subspace U ^ W upon which to apply the CC approach. This is accomplished by 
showing (Theorem |3]) that the joint distribution of the {X^} induces a unique decomposition of the ??T,-dimensional 
space V into a chain of subspaces identified by a normalized measure of entropy. Given this subspace chain, it is 
a simple matter to determine the optimum subspace U containing the desired subspace W. 

Example 1: Consider a setting where there are 4 sources, Xi, . . . , X4 having a common alphabet F2, whose 
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joint distiibution is described as follows: 
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(5) 



where {Yi\f^^ are independent random variables such that Yi,l2 ~ Bernoulli (pi ), I3 ~ Bernoulli(p2), 5^4 ~ 
Bernoulli ( i ), < Pi < P2 < \- Assume that the receiver is interested in decoding the single linear combination 
Z = Xi + X2 + X^ + X4 mod 2. In the subspace notation, this is equivalent to decoding the one dimensional 
space W =< Z >. The CC approach would choose the common encoding matrix A so as to compress Z to its 
entropy, H{Z), thus yielding a sum rate 

R^cc\^) = ^H{Z) = AHiYi+Ys). (6) 

It will be shown in Theorem |3] that there is a unique subspace chain decomposition of V =< Xi, X2, X3, X4 > 
given by {0} C VF^^) C H^(2) c 14^(3) = V, where 

1^(1) = <Xi+X2,X2 + X3> 

t^(2) = <Xi+X2,X2 + X3,X3+X^> . (7) 

With respect to this chain of subspaces, the best superspace U ^ W to consider under the SS approach is the 
smallest subspace in the chain that contains W, which in this case, is U = VF^^) (pjg ^ xhe sum rate in this 
case, identified by Theorem |5l turns out to be given by 

R'sTiW) = AH{Y3) < R'^^\W), (8) 

where the second inequality in dD follows as the {Yi\ are independent. 



= V 




Fig. 3. The subspace chain decomposition of V in Example [T] and its application in determining the optimal subspace U to compute W, 
under the SS approach. 

Nested Codes (NC) approach : This approach is motivated by the subspace-chain decomposition and may be 
viewed as uniting under a common framework, both the CC approach of using a common linear encoder as well as 
the SW approach of employing different encoders at each source. To illustrate the approach, we continue to work 
with Example [T] Under the NC approach, the decoding happens in two stages. In the first stage the receiver, using 
the CC approach decodes the subspace W^^\ In the next stage, using W^^^ as side information, M^(2) decoded 
(using a modified CC approach which incorporates side information). The encoding matrices of the various sources 
are as shown in Fig. H] The matrix Bi appearing in the figure is the common encoding matrix that would be used 
if it was desired to compute subspace W^^^ alone. The block matrix [BJ B2Y is the common encoder that would 
have been used if one were only interested in computing the complement of M^^^^ in M^^^) l^(^) as side 
information. It can be shown that there is a rearrangement of the {Xi} under which the complement of W'^^^ in 
VF(2) can be made to be a function only of two of the random variables which, we have assumed without loss of 
generality here to be X3, X4. This explains why the submatrix B2 appears only in the encoding of sources X3, X4. 
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Fig. 4. Encoder structure under NC approach for decoding the subspace W in Example [T] 



The sum rate in this case turns out to be given by 

RNr\W) = 2H{Yi) + 2H{Ys), 

which can be shown to be less than as well as the sum rate, = H{Xi, X2, X^, X4), of the 

Slepian-Wolf (SW) approach under which the subspace W is computed by first recovering each of the four random 
variables {Xi}. 

A graphical depiction of the sum rates achieved by the various schemes is provided in Fig. |5l with the sum rates 
appearing on the vertical axis on the far right. It turns out that in general, we have 

In the particular case of the example, we have that there exist choices of probabilities pi,p2 such that 
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Fig. 5. Illustrating sum-rate calculation for the various approaches to computing W. 



C. Other Related Work 

Some early work on distributed function computation can be found in lH, ||5l, ||6l. In lH, the general problem 
of distributed function computation involving two sources is considered and the authors identify conditions on the 
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function such that the SW approach itself is optimal. In f5l, the authors address the Korner-Marton problem of 
computing the modulo-two sum of two binary sources and produce an achievable -rate pair which is strictly outside 
the time shared region of the SW and Korner-Marton rate region^ For the same problem it is shown in |6], that 
if H{Xi + X2) > vai\i{H{Xi),H{X2)), then the SW scheme is sum-rate optimal. In Q, the authors showed how 
linear encoders are suitable for recovery of functions that are representable as addition operations within an Abelian 
group. 

The problem of compressing a source Xi, when X2 is available as side information to the receiver, and where 
the receiver is interested in decoding a function f{Xi,X2) under a distortion constraint, is studied in |]8]. For the 
case of zero distortion, the minimum rate of compression is shown to be related to the conditional graph entropy 
of the corresponding characteristic graph in [j9l. The extension of the nonzero distortion problem for the case of 
noisy source and side information measurements is investigated in lITOl . 

In ifTTl . Doshi et. al. consider the lossless computation of a function of two correlated, but distributed sources. 
They present a two-stage architecture, wherein in the first stage, the input sequence at each source is divided into 
blocks of length n and each block is coloured based on the corresponding characteristic graph at the source. In the 
second stage, SW coding is used to compress the coloured data obtained at the output of first stage. The achievable 
rate region using this scheme is given in terms of a multi-letter characterization and the optimality of the scheme is 
shown for a certain class of distributions. In llT2l . the authors derive inner and outer bounds for lossless compression 
of two distributed sources X,Y to recover a function f{X,Y,Z), when Z is available as side information to the 
receiver. The bound is shown to be tight for partially invertible functions, i.e., for functions / such that X is a 
function of f{X, Y, Z) and Z. 

For the case of two distributed Gaussian sources, computation of a linear combination of the sources is studied 
in |[T3l . |[T4l . wherein lattice-based schemes are shown to provide a rate advantage. Zero-error function computation 
in a network setting, has been investigated in lITSl . IIT6I . lITTl . 

A notion of normalized entropy is introduced in Section |II] Section JII] discusses the rate regions under the CC 
and SS approaches. The unique decomposition of the m-dimensional space V into a chain of subspaces identified 
by a normalized measure of entropy, is presented in Section ^V] It is shown how this simplifies determination of 
the minimum symmetric rate under the SS approach. An example subspace computation along with an attendant 
class of distributions for which the SS approach is optimal, are also presented here. The nested-code approach is 
presented in Section |V] along with conditions under which this approach improves upon the SW approach as well 
as examples for which the NC approach is sum-rate optimal. Most proofs are relegated to appendix. 

II. Normalized Entropy 
We will use pu to denote the dimension of a subspace U. 

Entropy of a subspace: To every subspace U of V, we will associate an entropy, which is the entropy of any set 
of random variables that generate U. We will denote this quantity by ^.{U) and refer to this quantity loosely as 
the entropy of the subspac^ U. Thus, if U =< Yi, . . . , Yp^ > 

n{U) = H{Yi,...,Yp^). (9) 

^{{U) can also be viewed as the joint entropy of the collection of all random variables contained in the subspace 
U, i.e., Ti{U) := H{{U}). Next, given any two subspaces Ui and U2, we define the conditional entropy of the 
subspace U2 conditioned on f/i as 

n{U2\Ui) ^ H{{U2}\{Ui}) . (10) 

Let Ui + U2 denote the sum space of Ui,U2- Clearly, T-i{Ui + U2) = H{{Ui},{U2})- Hence we can rewrite the 
above equation as 

'H{U2\Ui) = n{Ui + U2)-n{Ui). (ii) 

'The sum rate of this achievable rate-pair is however, still larger than the minimum of the SW and Korner-Marton sum rates. 
^We have used HiU) in place of H{U) so as to avoid confusion with the entropy of a random variable whose every realization is a 
subspace. 
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Normalized entropy: We define the normalized entropy ^.^{U) of a non-zero subspace U of V a.s the entropy 
of U normalized by its dimension i.e., 

H{U) 



■Hn{U) 



Pu 



(12) 



For any pair of subspaces C/i, C/2, U2 ^ Ui, we define the normalized, conditional entropy of U2 conditioned on 
Ui, to be given by 

^ n{U2\Ui) 



PU2 - PUinU2 

Note that since PU1+U2 = PUi + PU2 ~ PUinU2' we can equivalently write 

, n{U2 + Ui)-n{Ui) 



PU2+U1 - PUi 



(13) 
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Fig. 6. An illustration of nomalized entropies 
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The above definitions are illustrated in Fig. |6] where the j;-axis corresponds to the dimension of subspaces and 
y— axis corresponds to the entropy of subspaces. The slope of the line Li is the normalized entropy, T-Ln{Ui), of 
Ui and the slope of the line L2 is the normalized conditional entropy, T-Ln{U2\Ui). 



III. Common Code and Selected Subspace Approaches 

The minimum symmetric rate of the CC and the SS approaches to the distributed subspace computation problem 
described in Section II-BI are presented here. 



A. Rate Region Under the CC Approach 

Theorem 1: Consider the distributed source coding setting shown in Fig. |2] where there are m correlated sources 
Xi,. . . , Xm. and receiver that is interested in decoding the s dimensional subspace W corresponding to the space 
spanned by the set {Zi} of random variables defined in ([!}. Then minimum symmetric rate under the CC approach 
is given by 

Rcc{W) = max nNiWlWi). (15) 

WiCW 

Proof: See Appendix lAl ■ 

The best sum rate R''^'^^\W) under the CC approach is given by R'-1'^"'\W) = mRcciW). Note that in the 
special case when the receiver is interested in just a single linear combination, Z, of all the sources, the sum rate 
is simply mH{Z). The following example illustrates the minimum symmetric rate for the case, when the receiver 
is interested in decoding a two dimensional subspace. 
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Example 2: Let m = 3 and the source alphabet be F2. Consider a receiver interested in computing Zi = 
Xi, Z2 = X2 + X^ i.e, W =< Xi,X2 + X3 >. Then the minimum symmetric rate under the CC approach is given 
by 

RcciW) = max{:^^^^, H{Z,,Z2\Z,), 

H{Zi,Z2\Z2), H{Z^,Z2\Zi + Z2)}. 

(16) 

Remark 1: The CC approach is sum-rate optimal for the case when W = V iff 

-HNiV) > -HNiVlVi), yVi C V. (17) 
This follows directly from Theorem [T] by setting W = V and noting that the optimal sum-rate in this case is simply 

H{Xi, • • • , Xm)- 



B. Rate Region Under the SS Approach 

This approach recognizes that it is often more efficient to compute a superspace of W rather than W itself. The 
identification of the particular superspace that offers the greatest savings in compression rate is taken up in Section 
IIYl 

Theorem 2: Under the same setting as in Theorem [1] the minimum symmetric rate under the SS approach is 
given by 

RssiW) = mill ma^nNiUlUi). (18) 
u^wuicu 

Proof: Follows directly from Theorem [T] ■ 

The (best) sum rate i?ss""^(VF) under the SS approach is given by i?^s"'"*(T^) = mRss{W). Any subspace 
U ^ W which minimizes maxu^cu 'Hn{U\Ui) will be referred to as an optimal subspace for computing W under 
the SS approach. There can be more than one optimal subspace associated with a given W. 



IV. A Decomposition Theorem for the Vector Space V Based on Normalized Entropy 

While the results of this section are used to identify the superspace U that minimizes the quantity maxu^cu 'Hn{U\Ui) 
appearing in Theorem |2] they are also of independent interest as they exhibit an interesting interplay between linear 
algebra and probability theory. Also included in this section, are example subspace-computation problems and a 
class of distributions for which the SS approach is sum-rate optimal, while the CC and the SW approaches are not. 

Theorem 3 (Normalized-Entropy Subspace Chain): In the vector space V, there exists for some r < m, a. unique, 
strictly increasing sequence of subspaces {0} = W^^^ C W^^^ $ • • • $i W^'^^ = V, such that, Vj € {1, . . . ,r}, 

1) amongst all the subspaces of V that strictly contain W^^~^\ Vl^O) has the least possible value of normalized 
conditional entropy conditioned on W^^~^^ and 

2) if any other subspace that strictly contains W^^~^^ also has the least value of normalized conditional entropy 
conditioned on W^^~^\ then that subspace is strictly contained in W^^\ 

Furthermore, 

< (ly 1^(1)) 

< ... < nN{w'^''^\w'^'-^^). (19) 

Proof: See Appendix iBl ■ 

We illustrate Theorem |3] below, by identifying the chain of subspaces {M^^-'^} for the case when the random 
variables Xi, . . . , Xm are derived via an invertible linear transformation of a set of m statistically independent 
random variables Yi,. . . ,Ym- 
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Lemma 4: Let [Xi, . . . , Xm] = [^i, • • • , ^mJC, where G is an (m x m) invertible matrix over and {Yi, i = 
1, . . . , m] are m independent random variables, each of which takes values in the finite field F^. Without loss of 
generality, let the entropies of {Yi,i = 1, . . . , m} be ordered according to 

< H{Yi) = ... = H{Y,^) < H{Ye,+i) = ...= 

H{Ye,+e.) <...< H{Y^.-j,^^^) = ... = H{Y^^^j^), (20) 

where 1 < < m,i = 1, . . . ,r and Y^\^ih = fn. Then, the unique subspace chain identified by Theorem |3] is 
given by 

{0} c < Fi, . . . ,y,^ > c < yi, . . . > 

c ... c < yi,...,y„ > . (21) 

Proof: See Appendix ICl ■ 

Remark 2: While Theorem |3] guarantees the existence of r, the above lemma shows that r can take any value 
between 1 and m depending on the joint distribution of the {Xj, 1 < i < m}. 



A. Identifying the Optimal Subspace Under the SS Approach 

Theorem 5 (Optimal Rate under SS approach): Consider the distributed source coding problem shown in Fig. |2] 
having m sources Xi, . . . ,Xm and a receiver that is interested in decoding the s dimensional subspace W. Let 
1^(0) C M^(^) £ • • • £ W^"^^ be the unique subspace-chain decomposition of the vector space V =< Xi, . . . , Xm >, 
identified in Theorem |3] Then an optimal subspace for decoding W under the SS approach is given by f7 = W^^°\ 
where jo is the unique integer, 1 < jo ^ satisfying 

W C W^^°\ W ^ H/Oo-i). 

Furthermore, 

Rss{W) = nN{W^^"^\W^^"-^^). (22) 
Proof: See Appendix iDl ■ 
Corollary 6: With the W^^\ 1 < j < as above, 



B. A subspace computation problem for which SS approach is sum-rate optimal 

Consider the setting where there are m sources Xi, . . . Xm having a common alphabet F2, with m even, and a 
receiver interested in computing the sum Z = [Xi + . . . + Xm) mod 2. Let the joint distribution of the {Xi, 1 < 
i < m} be specified as follows: 



Xi 




" 1 ... 


" 




Yi 


X2 




1 1 


... 




Y2 


Xm-1 




1 1 ... 


1 




Ym—l 


Xm 




1 1 ... 


1 1 




Y 



(23) 



where the {Yi},^]^ are statistically independent random variables such that Yi ~ Bernoulli(i), for i odd and 
Yi ~ Bernoulli (p), < p < ^, for i even. When m = 2, Xi and X2 can be verified to possess a doubly-symmetric 
joint distribution (see lUl), and this is precisely the class of distribution for which Komer and Marton showed that 
a common linear encoder is sum-rate optimal for the computation of the modulo-2 sum, Z = Xi -\- X2. We now 
assume m > 2 in the above setting and show that the SS approach yields optimal sum rate while the CC or SW 
approach do not. Note that by optimal sum rate we mean that this is best sum rate that can be achieved for the 
subspace computation problem, even if encoders other than linear encoders were permitted in Fig. [T] 
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From Lemma m we know that the unique subspace chain for =< Xi, . . . , Xm > is given by {0} C C 
W^'^\ where 

= <Y2,Y^,...,Ym> 

= < ^1 + X2,X3 + X4, . . . , Xrn-l + Xrn > 

and W^"^^ = V. Clearly, the subspace of interest W =< Xi + X2 + . . . + Xm > C W^^^ and hence by Theorem 
is an optimal subspace to decode W, under the SS approach. The minimum symmetric rate is given by 

Rssiw) = n^iw^'^) = ^(^^-^^•••'^-) = hip), 

[ 2 ) 

yielding a sum rate -Rss™' = mh{p). 

Now, under the CC approach, since we directly decode the single linear combination Z, the sum rate is given 
by (Theorem [D 



i?^.^^'") = mH{Xi + ...+X, 



m) 



mH{Y2 + Y^ + ... + Xm) > 

where (a) follows from (l23T l and (6) follows since {Y-i} are independent and p < ^. . Also, under the SW 
approach in which the whole space V is first decoded before computing W, the sum rate is given by ^sw"^ ~ 
H{Yu...,Ym) = f(l + %))>4r'. 

We now show that the SS approach is sum-rate optimal. If {Ri, . . . , Rm) is any achievable rate tuple, then 
Vi = 1, . . . , m, it must be true that 

(a) 

Ri > H [Xi + . . . + Xm\Xi, . . . ,Xi^i, Xi+i, . . . Xm) 

= H{Xi\Xi, . . . , Xi^i, Xi^i, . . . Xm) 

H{Yi + ... + Yi\Yi,...,Yi_i,Xi+i,... Xm) 

H{Yi\Yu ...,Yi_i,Yi + y^+a, ■ ■ ■ ,Ym) 

(£) r H{Yi\Yi + Yi+]), i < m 

\ H{Yi), i = m 

= Kp), (24) 

where (a) follows by considering a system in which we give {Xi, . . . , Xm}\{Xi} as side information at the 
receiver, (6), (c) follow from (1231 ) and (d) follows from the independence of the {Yi,i = 1, . . . , m}. The bound in 
((24)) holds true for all sources and hence Xll^i ^ — fnh{p). Since -Rgs™' = mh{p), it follows that the SS approach 
is sum-rate optimal. 



V. Nested Codes Approach 

The NC approach to the subspace computation problem is a natural outgrowth of our decomposition theorem 
for the vector space V. Under this approach, a sequential decoding procedure is adopted in which W^^'^ is decoded 
using W^^~^^ as side information. 



A. CC Approach with Side Information 

As before, we have a receiver that is interested in computing a subspace W of V with the difference this time, 
that the receiver possesses knowledge of a subspace 5 of W, S =< Yi, . . . , Yp^ > as side information. Let T be 
a subspace of W complementary to S in W, i.e., is a direct sum of S and T which is denoted by W = S (BT. 
Then clearly, it suffices to compute T given S as side information. 

We claim that there exists a complement T of 5 in which is a function of at most {m — ps) of the sources. This 
follows from noting that a basis for S can be extended to a basis for V by adding (m — ps) of the {Xj}. Without 
loss of generality, we may assume that these are the random variables Xp^^i, . . . ,Xm- Also, the intersection of 
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Fig. 7. CC approach for decoding witli side information, when side information is linearly independent of the sources. 



W with any complement of 5" in y is clearly a complement of S in W . It follows that there is a complement T 
of 5 in 1^ which is only a function of the (m — ps) sources Xp^+i, . . . ,Xm and thus it is enough to encode the 



sources X, 



We adopt the CC approach here and hence, n-length realizations of all the (m — ps) sources X 



Ps+l; • • • 



, Xm are 



encoded by a common matrix encoder A. Now, if T =< [Xp^+i, . . . , Xm]TT > for some (m — ps) x px matrix 
Tt of rank pT, then the receiver, as a first step, multiplies the received matrix [AX.p^^i . . . AX.m] by Ft on the 
right. These are then decoded using the side information Yi, . . . , Yp^ to obtain estimates of T (see Fig. |7]). The 
minimum symmetric rate for this approach is presented below. 

Theorem 7: Consider a distributed source coding problem, where the receiver is interested in computing the 
subspace W, given that 5 C is available as side information to the receiver. The minimum symmetric rate under 
the CC based approach presented above, is given by 



Rcc{W\S) 



Proof: Similar to the proof of Theorem [T] 



max{?^Ar(r|ri ©S)} 

TiCT 

max {nN{W\W^)}. 
WiCW 

s.t.Wi^S 



(25) 
(26) 



Note from (1261 ) that irrespective of the particular complementary subspace T that we choose to compute, the 
symmetric rate remains the same. The specific choice of T determines however, the number of {Xi,i = 1 . . . m} 
that are actually encoded. Since T has been selected such that only (m — ps) sources are encoded, the achievable 
sum rate in this case, is given by R'-^^™\W\S) = {m — ps)RcciW\S). 

W^^^ and S = W^^~^\ Then 



Corollary 8: Consider the case where W 



where the last equality follows since 



max 

s.t.UiDW^^-^'' 



(27) 



< 



< 



max 

s.t.UiDW'J-^'' 



max 



where (a) follows from Corollary |6] Thus, Rcc{W^^^\W^^-^^) = RcciW^^^), i.e. 



(28) 



the rates per encoder are the 
same in this instance with and without side information. The difference between the two cases is that in the presence 
of side information, we need encode only (m — Piyci-i)) sources as opposed to m leading to a reduced sum rate 
by the fraction ^""^"^"'^'^ 
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B. NC Approach for Subspace Computation 

Consider the chain of subspaces l^(^) £ • • • £ VF(^) as obtained from Theorem |3] Assume that we are interested 
in decoding the subspace W'^^\j < r. We will now describe a scheme for decoding W^^\ that operates in j stages. 
At stage £,i < j, we decode W^^^ using W^^~^^ as side information, using the CC based approach described above 
in Section IV-AI Using the same argument as in Section IV-AI it follows that at stage i,l < i < j, without loss of 
generality, it is enough to encode the sources Xm„p^ ^^^^+1, . . . ,Xm- 

From Corollary |8] the rate of each of the sources that are encoded in the ^''^ stage is given by 

4 Rcc{W^^'>\W^^-^'>) = nN{W^^'>\W^^'^'>). (29) 

Also, let A^^'^ denote the common encoding matrix used in the i^^ stage. Since Ri < R2 < . . . < Rj (see Theorem 
[3]), it can be shown, via a random coding argument and by invoking a union-bound argument on the probability 
of error calculation, that it is possible to choose the encoding matrices A^^\ . . . , A^^') having the following nested 
structure: 



^« = [Si], ^(2) 



B2 



5i 



(30) 



Please see Appendix |E] for a proof of this statement 

Thus, the sum rate achieved for decoding the subspace W^^\ under the NC approach, is given by 

i=\ 

= + (m - p,j^(,-.^)nN{W^\W^^-^^). (31) 

As in the case of the SS approach, a scheme for decoding an arbitrary subspace W under the NC approach would 
be to decode the subspace VF^-?"), where jo is the unique integer such that W C W'^^a) and W C l^/Oo-i). 

Note that whereas the one-stage CC approach for decoding W^^'^ would have used the highest-rate matrix A^^^ 
for all the sources, the NC approach uses it only for sources ^^^^+1, . . . , and uses lower-rate matrices 
for the remaining sources. Thus the NC approach clearly outperforms the SS approach for all subspaces with the 
exception of VF^^^. Even beyond this, the NC approach sum rate improves upon the SW sum rate for all subspaces 
W C W^^~'^^ , while in all other cases it equals the SW sum rate. These comparisons are made explicit in the two 
theorems below. 

Theorem 9: The sum rate r!~^^\W^^^) incurred in using the nested code approach for decoding the subspace 
W^^\ I < j < r satisfies R^'^\w'^^'^) < mRss{W^^'^), the sum rate for decoding using the SS approach. 

Equality holds iff j = 1. 
Proof: 

= mnN{W^^^\W^^"^^) = mRssiW^^^), (32) 



^Similar proofs regarding existence of nested linear codes liave been shown in the past, for example see Q. 
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where (a) follows by Theorem |5] Since the rates given in Theorem |5] are strictly increasing, (a) is an equality iff 
3 = 1. ■ 

Theorem 10: The sum rate I^^^\W^^'i) incurred in using the nested code approach for decoding the subspace 
W^^\ l<j<r satisfies R^'^\W^^'^) < niV), the sum rate for decoding W^^'^ using the SW approach. Equality 
occurs iff j = r. 
Proof: 

H{V) 

r 

1=3 
r 

= n{w^^-^^) + (m - pwu-^))nN{w'^^\w^^-^^) 

where (a) follows from Theorem |5] Since the rates given in Theorem |5] are strictly increasing, (a) holds with 
equality iff j = r. ■ 



C. An example subspace computation problem for which NC approach is optimal 

We now revisit Example [T] introduced in Section U and show that the NC approach is sum-rate optimal if the 
subspace of interest is VF = W^'^\ It is not hard to show that the subspace chain decomposition for the joint 
distribution in Example [T] is indeed as given in dT). Thus, from (|3TI ). the sum rate achievable using the NC scheme 
is given by 

= 2h{pi) + 2h{p2). (33) 
To show sum-rate optimality, note that if {Ri,R2,Rz,Ri) is any achievable rate tuple, then we must have 

(«) 

= H{X4\Xi,X2,X^) 
H{Yi\Yi,Y2.Y^ + Y^) 

H{Yi\Y:i + Yi) = h{p2), (34) 

where (a) follows by a considering a system in which Xi,X2,X^ is given as side information, (6) follows from 
© and (c) follows from the independence of {Yi,i = 1, ... ,4}. Next, if we consider a second system in which 
X4 alone is given as side information, then it must be that 

R1 + R2 + R3 > HiXi+X2,X2 + X3,X3+Xi\Xi) 

= H{Y^,Y2,Y3\Yi) 

= h{p2) + 2h{pi). (35) 

Combining (l34l ) and (l35l ). we get the lower bound on the sum rate given by Ri + R2 + Rz + R^ > 2h{pi) + 2h{p2). 
This, along with (|33] ) implies sum-rate optimality of the NC approach. Note from Theorems |9] and [TO] that the 
subspace and the SW approaches are both strictly suboptimal in this case. 
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Appendix A 
Proof of Theorem [T] 

Before proceeding to prove the theorem, for notational simplicity we shall denote [Zi,...Zs\ by Z'^'^l and 
assume that Wi is generated by Z^^'-^^G, where G is a full rank s x v matrix, where u = pvKi- Thus, the set of 
achievable rates per encoder under the CC approach given by Theorem [T] can be rewritten as follows. 

TZcdW) = \ r\R> ^ H{Z^^--'^ I Z^^--'^G)\ (36) 



for every choice of G, whose column space corresponds to a z^-dimensional subspace of F*, < < s — 1. 

In order to prove the theorem, we shall work with the system model shown in Fig. [H which is equivalent to that 
of CC approach. This system is same as SW system except that all the sources are encoded by a common matrix 
A. A rate R per encoder is achievable in this equivalent system iff it is achievable in the original system of interest 
(see Fig. IIJ). 

Achievability: The notion of typical sets will be required to prove the achievability. The following definition of 
e— typical set of a random variable X will be used: 

iV(a|x) 



n 



Px[a) 



<ePx{a),^a£X \ , (37) 



where X is the alphabet of the random variable X, N{a\x.) denotes the number of occurrences of the symbol a 
in the realization x. We refer the reader to ifTSl for properties of this typical set as well as the related notions of 
conditional and joint typical sets. We shall make use of the following lemma in the proof of the achievability. 
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Fig. 8. Equivalent system model for the CC approach 



Lemma 11: Let X and Y be two discrete random variables taking on values over a finite alphabet X and y 
respectively. Let Y = f{X) be a deterministic function of X. Let x be an n— length realization of the i.i.d. random 
variable X and y = /"(x), where /"(y) = (/(yi), . . . , /(?/«))■ Then we have 

{xG4")(X)|r(x)=y} = Af\x\y). (38) 

Proof: The proof can shown by using the definition of the typical set as given by ( [37] ). □ 
Achievability will be shown using a random coding argument by averaging over set of all matrix encoders of the 
form A : — > F^, where the /c x n matrix A is assumed to be a realization of the random matrix A, distributed 
uniformly on the ensemble M^xni^q)- We will apply the joint typical set decoder and calculate the probability of 

in) 

error Pe averaged over the all the source symbols and also over all realizations of A. 

Let the source sequences to be zl^-'*]. Then the decoder will declare z'^-*! to be the transmitted sequence if it is 
the unique sequence that belongs to A^J^\z^^-'^^) and AzI^''*] = ^zt^''^'. Thus, the decoder will make an error if any 
one of the following events happen: 

El : T^^--"^ Af^ {Z^^--'^) (39) 

such that 

Let us denote A'^-*! = v'^-*! — zl^-^l. Then, the probability of error is upper bounded as 

pin) <p[Ei)+P{E2) 
<Sn + 

P{z^^'-'^) J2 P{AA^^--'^ =0), 



Pi 

(41) 

where (5„ — > as n — )■ oo. We will now compute P(AA[^-*1 = 0) as follows. Let M{A^^-^^) be the nullspace of 
A'^'*! and v be its rank. Since A'^'*' ^ 0, we have < < s — 1. The rank of A'^''^! is (s — z^) and hence the 
rank of the left nullspace of A^^'*] is n — (s — z^). Thus, the number of matrices which satisfy AA^^'^^ = is 

are q choices for the matrix A, we get 

P(AA[i^^1 = 0) = ^ = q-''^'-''l (42) 
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Thus partitioning the set of all A'^'^I based on the rank of AA(A[^'''1), we can rewrite Pi in (|4T]) as 



s-l 



1^=0 Wr. dim{Wi)=u v[i-lgyl<"'(Z[i-l): 
A^(A[i-l)=M^i 

where W^i is a dimensional subspace of F^. We shall now provide an alternative expression for the set {v'^'^l G 
^^(^[1:^]) I 7V'(A[i^''1) = Wi} as follows. Let {gi,g2, ...,gu} denote a basis for H^i and let Gw^ = [gi • • -gi.]- 
Then 

jvli^^l G I AA(A[i^"1) = VFi} 

= {vti^^l G I vli^^lGvFi = z^^--'^Gw,}. (44) 

Now applying Lemma [TT] to the above equation wherein we set X = Z^^'^\ f{^) = Z^^''^^Gwi ^^id by noting that 
vl^-'*! is an n— length realization of Zt^-'^l, we get 

= I ^[i^^lf^j^j. (45) 

We substitute the above equation in (l43l) and use the resulting expression in (1411 to get 

z[i-ieAj(zii-!) v=o 

^ 2"[^^(Z[i-l|Zli-lGwi)(l+e)-(s-i^)^log(g)]_ 
Wi: dim(VFi)=J/ 

(46) 

where we used the fact that the size of the conditional typical set is bounded as 

(n) 

Thus, a sufficient condition for Pe — > is that 

-log(g) > I + 6) (48) 

n (s — z/j 

for every choice of i/-dimensional subspace Wi ofFg,0<z/<s — 1. 

We will now show the necessity of the inequalities in (l36l) if reliable decoding of the sources z'^'^l is desired 
thereby proving that Rcc{W) is the minimum symmetric rate achievable under CC approach. Let {Q,Q'^) denote 
a partition of the sources Z^^-'^\ such that \Q\ = (s — z/), < < (s — 1) and = {Zj,j G O}. It follows from 
SW lower bound |[T9] that 

R > —^—-HiZ^ I Z^") (49) 
{s-v) 

is a necessary condition. Note that the above inequalities are exactly those in (l36l ) obtained by choosing the columns 
of G from the set of standard basis vectors for F^. 

We will now show that the necessity of remaining inequalities in (l36l ). corresponding to other choices of G, is 
due to our restriction to a common encoding matrix. Consider a new system (which is also reliable) constructed 
as shown in Fig. |9] with the same encoder and decoder as that of the system in Fig. [H where yl^-'^l = Z^^'^^P, P 
being an s x s invertible matrix. Applying the SW bounds |[T9l to this new system we get, 

R > -^—H(Y^ I y^') , (50) 
(s-u) 
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Fig. 9. A derived system to compute Y' 



where 17 is some subset of the sources Y^^'-'\ such that \n\ = {s - u),0 < u < {s - 1). Since, Y^^'-'^ = Z^^-'^P 
and P is an invertible matrix, the above equation can be written as 

R > -^—H(Z^^--'^ I Z^^'-'^G) , (51) 
(s-u) 

where G is an s x i/ sub matrix of P containing the u columns corresponding to Y^" . Since the above bound is 
true for every invertible matrix P, we run through all subspaces of of dimension less than or equal to s — 1 
thereby establishing the necessity of the inequalities in (l36l) . 

Appendix B 
Proof of Theorem [3] 

We will first present a few properties of normalized and conditional normalized entropies, which will subsequently 
be used to prove the theorem. 

Lemma 12: Consider U,W QV such that U <^W. Then 

nN{U + Wi\W) ='Hn{U\W), VVFiCVr. (52) 

Proof: Follows directly by invoking the equivalent definition of conditional normalized entropy in ([T4l l. 

Lemma 13: Consider U,W QV such that U ^W. Then 

nN{u\w) <nN{u\unw). (53) 

Proof: Follows from the definition of conditional normalized entropy in ([T3] ) and by using that fact that 'H(C/|VF) < 

n{u\unw). 

Lemma 14: Consider W,Ui,U QV such that W Q Ui Q U. Then, one of the following three conditions is 
true. 

a) nN{Ui\W)< -HNiUm <Hn{U\Ui). 

b) nN{Ui\W)= HNiUW) =Hn{U\Ui). 

c) nN{Ui\W)> %n{U\W) y-HNiUpi). 

Proof: T-Ln{U\W) can be written as a convex combination of ^{^{UilW) and ?^7v(C/|C/i) as 

nNiU\W) = aTiNiUilW) + (1 - a)nNiU\Ui), (54) 
where a = — — —. The lemma now follows. □ 

pu—pw 

Proof of Theorem \3\ Consider the set 

Swo = {U\Wo C U and nN{U\Wo) < nNiW\Wo) 

y W QV,WoQW}, (55) 

i.e., Swo is the set of all subspaces of V which contain Wq and have the least normalized conditional entropy 
conditioned on Wq- We claim that Swo is closed under subspace addition, i.e., if Ui,U2 € Swo> then U1 + U2 S Swo- 
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The claim will be proved shortly. Since Swo is a finite set, this will imply that Q{Wq) = YliU&Sw ^ unique 
maximal element of Swo- Now, consider the chain obtained sequentially as follows: 

4 Q (^Vf^(J-i)) , V j > 1, (56) 

where l^^*^) = {0}. The construction proceeds until the r* stage, where VF^') = V . It is clear that the chain 
obtained from (l56l ) satisfies conditions 1) and 2) in Theorem [3] and is also unique. To prove that this chain also 
satisfies ( fT9l ). apply Lemma [141 to the three element subspace chain VF^^^) C W^^^ C W^^'^^\ 1 < i < r — 1. 
Since VF^^) = Q(V(J-i)), we have that 'HN{W'^j^\W'^^-^^) < 'Hn{W'^^-^'^'^\W^^-^^). Hence condition a) of Lemma 
[141 is true in this case and thus 

Now, we will prove our claim that Swo is closed under subspace addition. Let Ui,U2 € Swa- If ^2 ^ Ui or 
Ui C U2, the claim is trivially true. Thus, assume that U2 ^ Ui, Ui ^ [/2 and consider the following chain of 
inequalities. 

TiNiUilWo) < nNiUi + U2\Ui) (58) 
Hn{U2\Ui) (59) 

< nN{U2\Uir\U2) 
(d) 

< nN{U2\Wo), (60) 

where (a) follows by applying Lemma [141 to Wq £ f^i C [7^^ + [/2 and using the fact that T-Ln{Ui\Wo) < 
T~iN{Ui + f/2jl^o) (since Ui G Swq), (b) follows from Lemma [T2l (c) follows from Lemma [T3l and finally, (d) 
follows trivially, if = f^i n f/2; else if Wq C [/i n C/2, by applying Lemma [HI to Wq C C/i n C/2 C U2 and 
noting that f/2 G Swo- 

But, ?^Ar([/i|iyo) = 'WAf(f^2|W^o) and thus all inequalities in ( [6OI ) are equalities. Especially, from (a), we get 
that ?^Ar(C/i + C/2IC/1) = 'H7v(J7i|Wo)- Lemma [H now implies that TiNiUi + ?72|Wo) = T-iNiUilWo) and thus 
C/l + C/2 G 



Appendix C 
Proof of Lemma [4] 

Since {Xi} and {Yi} are related via an invertible matrix, < Xi, . . . , Xm >=< Yi, . . . , Ym. >■ Thus, we will just 
find the subspace chain for the {Yi}. Set Uq = {0} and Uj =< Fi, I2, • • • , Y-^^ >, 1 < j < r. Let U Q V he 
such that Uj^i C U. We will now show that ?{Ar(C/|f/,_i) > ?{Ar(C/, |C/,_i) with equality only if U Q Uj. This 
will imply that the chain Uq Ui ^ . . . C Ur satisfies the conditions of Theorem [3] and hence, is the required 
chain. 

Let U nUj = Uj^i © A and [/=([/ n Uj) © B, for some subspaces A, B. Then, T-LN{U\Uj^i) can be expanded 

as 

(a) n{A®B\Uj-i) 



'HN{U\Uj^ 



PA + PB 

(J} 'H{A\Uj^i) + 'H{B\A®Uj^i) 

PA + PB 

w n{A\Uj^i) + n{B\Uj) 

~ PA+ PB 

W PaH{Y^^-i f^+i) + ^.+1) 

~ PA + PB 

> = Hn{U,\U,^i), (61) 
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where (a) follows from Lemma [12] (b), (c) both follow directly from the definition of conditional entropy in ([TTI i. 
(d) follows from the fact that if U' is a subspace such that U' n Uj = {0}, for any j, then 

^Jv(C/'|C/i) >^(%^^,,+i). (62) 

This will be proved shortly. Finally, (e) follows from the assumption on the ordering of the entropies of {Y]} (see 
(l20l)). Note that equality holds in (e) only if B = {0}. 

We will now prove (l62l ). Let U' =< [Yi, . . . ,Ym\Tij> >, for some (m x pi^) full rank matrix Tu'. Column 
reduce Tu> by selecting, for any column, the last row which has a non zero entry and using that entry to make 
all the other entries in that row as zeros. Let S = {ti, . . . ,tp^^,} denote the the row indices corresponding to the 
identity sub matrix which occur after the column reduction. Since U' PI Uj = {0}, it must be true that 



U' > XI ^^ + 1' ^<i' <PU'- (63) 
Now, if we let S'^ = {1, . . . , ?ti}\5, we have 

nN{u'\Uj) 



n{u'\Uj) 



PU' 

(«) njU'l <Yi,ie >) 

PU' 

(,) HiYt,,...,Yt ) 



PU' 

> (64) 

where (a) follows since Uj C< Yi,i G S'^ >, (b) follows from (|63] | and (c) follows from the assumption on the 
ordering of the entropies of {1^} (see (l20b). 

Appendix D 
Proof of Theorem [5] 

The proof involves two steps, which are outlined next. Each step will be proved subsequently. 

Step 1 : Consider the chain of subspaces W^^'^ C W^^^ £ ■ ■ ■ $ W^^'^ as obtained from Theorem |3] We will show 
that that the infimum of the achievable rates for decoding the subspace VF^-'^ under the CC approach (see Section 
IIII-AI) is given by 

i?cc(VF(j)) = -Hn (w^^'>\W'^^-^^^ , V 1 < i < r. 

Step 2 : Next, consider any subspace W (^V, such that W ^ W^^~^^ and W C W^^\ We show that an optimal 
subspace to decode W under the SS approach is W^^\ by showing that for any other subspace W' 5 W, we have 

Rcc{W')> -Hn (w^^^\W^^~^A = RcciW^^^). 



A. Proof of Step 1 

Proof by induction on j. Statement follows for j = 1, since from ([T5] ). we have 

RcciW-^h = max nNiW-^^lUi) 

-HNiW^'^), (65) 

where (a) follows by applying Lemma [T4l to {0} ^ [/^ £ W^^'^ and noting that from Theorem |3] that W^^'^ has the 
least normalized entropy among all subspaces of V. Now, assume that the statement is true for j — 1, i.e., 

max InNiW^^'^'^lUi)} = nN(W^^~^^\W^^~^h. (66) 
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Then, we need to prove that 

max inNiW^^^Ui)] = UNiW^^^W^^-^h, (67) 

which will imply RcdW^^"!) = ■Hn{W'^^'i\W^^-'^'^). For any Ui C W^i\ let A = Ui^ W^i"^'^ and let A'' be its 
complement in C/i. Then 

= n{w'^^'^)-U{A®A'-) 

= n{w^^"^'^) + n{w^^\w^^^^^) - n{A) - n{A'\A) 
n{w^^~^^\A) + n{w^^^\w^^~^^) -niA'^iA) 

< n{w^^-^^\A) + niw^^'>\w^^-^'^) -niA'\w^^-^'>) 

< (/ovF(-i) - PA)nN{w^^^\w'-^-^^) 

= {pwu^ -pu,)nN{W^'^\w'^'-^^), (68) 

which implies that ■HNiW^^'>\Ui) < 'HNiW^^'>\W^^-^'>). Here, (a) and (b) follow since A C W'^^'^l (c) follows 
trivially if A"" = {0}; else from Theorem |3] V.NiW'^^'^lW^^-'^'^) < UNiA" + TyO-i)|V^O-i)) = 'Hjv(A^|W^(j-i)). 
(d) follows trivially if ^ = W^^~^^; else by induction hypothesis on j — 1 (put Ui = A in (l66l) ) and finally, (e) 
follows since by TheoremH ?^Ar(W^(-'-i)| 1^(^-2)) < •Hjv(l^(^) IVF^-''^!)). 



B. Proof of Step 2 



i?cc(VF') = max {UNiW'pi)] 

U 1 c w 

(a) , ^ 

> ?^jv(VF'|W^(^-i)) 



> ?^jv(VF(^')|Ty(^-^)) (69) 
i?cc(VF(^)), (70) 

where (o) follows by substituting Ui = W PI W^^~^\ (b) follows from Lemma [T3l (c) follows from Lemma [T2l 

(d) follows since by Theorem |3] W^^^ is least normalized entropy subspace conditioned on W^^~^\ and finally, 

(e) follows from Step 1. 



Appendix E 
Existence of Nested Codes 

For any ki < k2 <■..< kj, let 'B£,£ = I, ... ,j, denote a random matrix uniformly picked from the set of all 
{ke — ki^i) X n matrices over ¥q (note here ko = 0). The encoding matrix for the stage, A^^), is assumed to 
have the nested form A^^) = [B* , . . . , B^]*, I <£ < j. For any i < j, let VF^^^ =< Yi, Yp^^,^ >. As discussed 
in Section rV-B[ the stage computes the complement of W^^"^) in W^^^ using w(^~^) (all the output up till the 
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{£ — 1)* stage) as side information. Now, for any fixed set of encoding matrices, let ££ denote the error event up 
till the stage, i.e., 

Si : (Yi,...,Y,„,J/(Yi,...,Y,^,J. (71) 

Also let Pg"^ denote the probability of error in the i^^ stage assuming that all the previous stages were decoded 
correctly (i.e., when the i^^ stage receives W^^"^) as side information). Thus, pf"^ = P(£'f|f^_^). Thus the overall 
source averaged probability of error, in computing W^-^) can be upper bounded as 

= p{Sj) (72) 

< P{£,^i) + P{£j\£;;^,) (73) 
= P{£j^i)+P^f (74) 

< iZPe% (75) 

where the last equation follows by repeating steps from (ITlli-dT?]). Averaging Pi"'* further over the ensemble of 
encoding matrices, we get 



<pi")> = j]PA(..(^(^'))^pi: 



A") 1=1 
j 



EE^a.>(^^^))pS\ (76) 



i=l 



where, in the last equation we have interchanged the order of the two summations and also used the fact that stage 
£ depends only on Bi,...B^ and hence Pj^(f){A^^^) can be marginalized over the incremental matrices of the 
remaining stages to get Pj^(£){A^^^). But, now from the achievability proof of Theorem |7] we know that the inside 

termEA<^,PA<4^W)Pi';)"^0if 



n 

This proves the existence of the nested codes as claimed. 



^logg > Hn (vF^^W^^"^^) ,V 1 < ^ < j. (77) 



